Plotting in R (Base graphics)

Introduction

R has excellent graphics and plotting capabilities. They are mostly found in following three sources. + base graphics + the lattice package + the ggplot2 package

Base R graphics uses a pen and paper model for plotting while Lattice and ggplot2 packages are built on the routines first used in grid graphics.


Line Charts

First we’ll produce a very simple graph using the values in a treatment vector:

treatment <- c(0.02,1.8, 17.5, 55,75.7, 80)

Now, let’s add a title, a line to connect the points, and some color:

Here we plot treatment using blue points overlayed by a line

plot(treatment, type="o", col="blue",main="Treatment")


Create a title with a red, bold/italic font

title(main="Treatment", col.main="red", font.main=4)

Put it all together

plot(treatment, type="o", col="blue", ylim=c(0,100))
lines(control, type="o", pch=22, lty=2, col="red")
title(main="Expression Data", col.main="red", font.main=4)


Next let’s change the axes labels to match our data and add a legend. We’ll also compute the y-axis values using the max function so any changes to our data will be automatically reflected in our graph.

Calculate range from 0 to max value of data

g_range <- range(0, treatment, control)

range() returns a vector containing the minimum and maximum of all the given arguments.


Plot treatment using y axis that ranges from 0 to max value in treatment or control vector. Turn off axes and annotations (axis labels) so we can specify them ourselves.

plot(treatment, type="o", col="blue", 
     ylim=g_range,axes=FALSE, ann=FALSE)


Make x axis using labels

axis(1, at=1:6, lab=c("Mon","Tue","Wed","Thu","Fri","Sat"))


Make y axis with horizontal labels that display ticks at every 20 marks.

axis(2, las=1, at=20*0:g_range[2])


Create box around plot

box()


Add control data, main title and x/y axis titles

lines(control, type="o", pch=22, lty=2, col="red")
title(main="Data", col.main="red", font.main=4)
title(xlab="Days", col.lab=rgb(0,0.5,0))
title(ylab="Values", col.lab=rgb(0,0.5,0))

Create a legend at (1, g_range[2]) that is slightly smaller (cex) and uses the same line colors and points used by the actual plots

legend(1, g_range[2], c("treatment","control"), cex=0.8,
       col=c("blue","red"), pch=21:22, lty=1:2);  

Bar Charts

Let’s start with a simple bar chart graphing the treatment vector: Plot treatment

barplot(treatment)


Let’s now read the data from the example.txt data file, add labels, blue borders around the bars, and density lines:

Read values from tab-delimited example.txt

data <- read.table("data/example.txt", header=T, sep="\t")

names.arg is a vector of names to be plotted below each bar or group of bars.

density is a vector giving the density of shading lines, in lines per inch, for the bars or bar components.

barplot(data$treatment, main="Treatment", xlab="Days",ylab="values", 
        names.arg=c("Mon","Tue","Wed","Thu","Fri","Sat"), 
        border="blue", density=c(10,20,30,40,50,60))

Histograms

Let’s start with a simple histogram plotting the distribution of the treatment vector:

Create a histogram for treatment

hist(treatment)  


Concatenate the three vectors

all <- c(data$control, data$treatment)

Create a histogram for data in light blue with the y axis ranging from 0-10

hist(all, col="lightblue", ylim=c(0,10))

Now we can configure the groups in the histogram using the breaks parameter.

For breaks we must supply a single number giving the number of cells for the histogram or the open intervals as a vector.

Compute the largest value used in the data

max_num <- max(all)

Here we create a histogram setting breaks so each number is in its own group and make the x axis range from 0-max_num.

hist(all, col=heat.colors(max_num), breaks=max_num, 
     xlim=c(0,max_num),main="Histogram", las=1) 


Here we set the freq parameter to be FALSE for probability densities instead of TRUE for the histogram graphic to be a representation of frequencies.

hist(all,breaks=max_num,xlim=c(0,max_num), 
     main="Probability Density",las=1, cex.axis=0.8, freq=F)

Now let’s add a heading, change the colours, and define our own labels:

Create a pie chart with defined heading and custom colours and labels

pie(treatment, main="Treatment", col= c("lightblue", "mistyrose",
                                        "lightcyan","lavender", 
                                        "cornsilk","maroon"),
    labels=c("Mon","Tue","Wed","Thu","Fri","Sat"))  


Now let’s change the colours, label using percentages, and create a legend:

Define some colours ideal for black & white print

colors <- c("white","grey70","grey90","grey50","black")

Calculate the percentage for each day, rounded to one decimal place

treatment_labels <- round(treatment/sum(treatment) * 100, 1)

Concatenate a ‘%’ char after each value

treatment_labels <- paste(treatment_labels, "%", sep="")

Create a pie chart with defined heading and custom colors and labels

pie(treatment, main="treatment", col=colors, labels= treatment_labels,
    cex=0.8)

Create a legend at the right

legend(1.5, 0.5, c("Mon","Tue","Wed","Thu","Fri","Sat"), cex=0.8,
       fill=colors) 


Dot charts

Let’s start with a simple dot chart graphing the data:

Here we use the function t to return the transpose of a matrix.

dotchart(t(data))   


Let’s make the dotchart a little more colorful:

Now we create a colored dotchart for autos with smaller labels

dotchart(t(data), color=c("red","blue","darkgreen"),
         main="Dotchart", cex=0.8)  


Box plots

The final plot we will look at is a box and whisker plot.

Boxplots allow you to quickly review data distributions, showing the median and 1st/3rd quartile.


First lets read in the gene expression data

exprs <- read.delim("data/gene_data.txt",sep="\t",h=T,row.names = 1)
head(exprs)
##                    Untreated1 Untreated2  Treated1   Treated2
## ENSDARG00000093639  0.8616832  1.9311442 0.1041508 0.14055604
## ENSDARG00000094508  0.9857575  2.0256352 0.1549917 0.20301609
## ENSDARG00000095893  0.8498889  1.9875580 0.2317969 0.20925123
## ENSDARG00000095252  0.9242996  2.0857620 0.2562264 0.24669079
## ENSDARG00000078878  0.3571734  0.4653908 0.1167221 0.09710237
## ENSDARG00000079403  1.0604071  1.2581398 0.3884836 0.31567299

Now we can use the boxplot() function on our data.frame to get our boxplot

boxplot(exprs)


Perhaps it would look better on a log scale. We can add addition colours and labels as with other plots.

boxplot(log2(exprs),ylab="log2 Expression",
        col=c("red","red","blue","blue"))

Here, we will use different dataset with two columns each for treated and untreated samples.

data1 <- read.table("data/gene_data.txt", header=T, sep="\t")
head(data1)
##      ensembl_gene_id Untreated1 Untreated2  Treated1   Treated2
## 1 ENSDARG00000093639  0.8616832  1.9311442 0.1041508 0.14055604
## 2 ENSDARG00000094508  0.9857575  2.0256352 0.1549917 0.20301609
## 3 ENSDARG00000095893  0.8498889  1.9875580 0.2317969 0.20925123
## 4 ENSDARG00000095252  0.9242996  2.0857620 0.2562264 0.24669079
## 5 ENSDARG00000078878  0.3571734  0.4653908 0.1167221 0.09710237
## 6 ENSDARG00000079403  1.0604071  1.2581398 0.3884836 0.31567299

Plot histograms for different columns in the data frame separately. This is not very efficient. You could also do it more efficiently using for loop.

par(mfrow=c(2,2))
hist(data1$Untreated1)
hist(data1$Treated2)
hist(data1$Untreated2)
boxplot(data1$Treated1)

Saving in bitmap format

bmp(file = "control.bmp")
plot(control)
dev.off()

Saving in postscript format

postscript(file = "control.ps")
plot(control)
dev.off()

Exercise on base plotting can be found here


Answers for baseplotting can be found here